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ABSTRACT 

A quantitative method , meta-analysis, is used to 
integrate- research findings from a representative group of 38 
coaching studies. Because different studies reported results on 
different scales, effect size was used to transform all results to a 
common metric . This meta-analysis showed that there are two distinct 
literatures on the effectiveness of coaching programs. The first is 
on the Scholastic Apt i tude Test {SAT) , ; and reports small effects from 

?9??^i?9* _?b?_?? cori ^_ c 9Y e ? s _ 0 ^b e ?_?P^ i^?^? _^?? ts ?hows that 

coaching programs can have substantial effects. Studiesthat used a 
pretest yielded larger estimates of pure coaching effects than did 
other studies. This indicates that a pretest may be ah important 
component in any program designed to prepare students for aptitude 
tests. Results support the conclusion that variation in study 
findings is only modestly predictable from study characteristics. 
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Aptitude testing is a multimillion dollar industry that 
plays an important role in American education . Every year 
millions of students take such tests as the Graduate Record 
Examination, the Scholastic Aptitude Test, and the Law 
School Admissions Test , and thei r lives are affected by the 
results. In recent years a "coaching" industry has grown up 
in the shadow of the test ing establishment^ Thi s satellite 
industry offers "coaching" and "crash" courses to help ^ 
students improve their chances of scoring high on admissions 
and aptitude tests. The coaching industry is made up of at 
least 150 independent firms, and it offers services for 
50 , 000 students annually. 

The testing and coaching industries embody, different 
beliefs about aptitude testing; According to the testers, 
aptitude tests measure capacities that, are developed 
gradually from in- school. and. but -of -school exper iences — 
capacities that are not likely to be changed significantly 
by short-term coaching (College Entrance Examination Board, 
1 968 , p. 8 ) . The coaching industry , on the other hand , 
maintains that aptitude test scores can be raised by such 
practices as test-familiarization, drill and pract ice , 
instruction in test-taking strategy, and highly focussed 
content teaching. 

It is difficult to decide on the basis of individual 
research studies which of these views is more reasonable. 
Studies of coaching have been_carried out . in different 
settings, with different experimental designs , and with 
different results . Some studies have produced results that 
support the testing industry's view on the modif iability of 
aptitude test scores, whereas other studies produced results 
that support the coaching industry 1 s view. 

Reviews of. coaching studies have not resolved the 
controversy.. The first reviews were written in England and 
supported the conclusion that coaching has a significant 
influence on test performance. In one of the best of the 
British reviews, Vernon (1954) reported that the average 
effect of coaching and practice was to increase IQ scores by 
8 to 9 points , or by about .6 standard deviations: Vernon 
pointed out that such an effect could be achieved in a 
remarkably short time , usually between 3 and 9 hours. More 
recent reviews of coaching studies have focussed on the 
widely used SAT. These reviews have generally emphasized 
the futility of coaching. The trustees of the College 
Board, for example, stated that the average increase to be 
expected from an intensive coaching program of perhaps 15 to 
20 hours would be .10 standard deviations (College Entrance 
Examination Board, 1968). 

There are at least two reasons for the inconsistency in 
conclusions about the effects of coaching: (1) reviewers 
have not examined the same studies; and (2) reviewers have 
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not analyzed the accumulated study results with quant itative 
and statistical methods. Our study was meant to overcome 
theselimitations^It used a quantitative method— meta-_ 
analysis--to integrate research findings from a large and 
representat i ve group of coaching studies . 

Method 

The first step in our meta-analysis was to collect th^ 
studies. We located in all 35 separate _ reports cn_ 38 
differ en t studies. These i sports came from journal 
articles, dissertations, and ERIC documents. 

The coaching procedures and tests in the 38 studies 
were of several different types. We f i r st classi f ied the 
studies according to these program and test features 
(Table 1). The first of the variables, for example , 
classified each study according to the level of training 
intervention. At the lowest level were short test-taking 
orientation sessions ; at a somewhat higher level were longer 
coaching programs that included intensive , concentrated 
drill or "cramming" on sample test questions; and at the 
highest level was instruction in broad cognitive skills. In 
?ddi t ion to test and program features , we coded 
methodological character istics of the studies, features of 
the experimental populations, and publication features of 
the reports (Table 2). 

Because different studies reported. results on different 
scales, it was necessary to transform all the results to a 
common metric , The metric that we employed was the Effect 
Size or ES . _ This measure expresses differences between 
experimental and control scores in terms of standard- 
deviation units. 

Re sults 

The distribution of ES's was mul t i-modal i n shape 
(Figure 1). One of the modes was at . 1 standard deviations; 
another was at .4 standard deviations . The studies of 
coaching for the SAT were clustered tightly around the 
smaller mode; other studies were spread somewhat more 
loosely around the larger mode (Figure 2 ) . Coaching _ _ 
programs for the SAT thus seemed to have different effects 
from coaching programs for other tests. 

Further examination of the data showed that studies of 
SAT coaching were distinct from other studies in additional 
ways . Compared to other studies, _ the. SAT studies were 
significantly more likely to involve long-term coaching, 
field- tested coaching programs, coaching by a commercial 
school , testing for a real-life educational decision^ higher 
grade levels , pre/post research_designs , arid research 
carried out by ETS. Because SAT studies were different from 
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other studies both in features and in outcomes*, we carried 
out all further analyses separately on SAT studies and on 
studies of other aptitude tests. Analysis of data from the 
total group would have produced misleading results. 

CoacMnq— £oc--4ji e SA T 

All of the SAT studies employed both pretests and 
post tests . Improvement from initial to post tests averaged 
.36 standard deviations for the experimental groups and .21 
standard deviations for the control groups . The effect of 
coaching alone , estimated f rom these 14 studies , was 
therefore equal to .36 minus .21, or .15 standard-deviation 
units; 

None of the study features was significantly related to 
size of effect in the SAT studies . Effects were similar for 
SAT coaching programs of different durations and. with 
different characteristics.. Findings .were also similar in 
groups of studies that used quite different methodologies or 
that employed distinctly different subject groups. And. 
finally* findings were much the same for studies published 
in different ways and at different times. 

Coaching for other Tests 

Seventeen of the 24 studies of coaching for aptitude 
test s other than the SAT employed both pretests and 
post tests ,. Improvement f rom pretest to post test averaged 
•76 standard deviations for the experimental groups and .25 
standard deviations for the control groups; The effect of 
coaching alone , estimated from these 17 studies, was 
therefore equal to .76 minus .25, or .51 standard-deviation 
units,, Studies that did not use pretests yielded a 
significantly lower est imate_ of _ the size of coaching 
effects.. _0n the basis of all 24 studies* we estimated the 
average E£> of coaching to be .43. 

The use of a pretest in the exper imental design turned 
out to be the only study feature significantly related to 
size of e f f ect . Other study features were not signi f icantly 
related to coaching outcomes. 



Di scuss ioa 

This meta-analysis showed that there are two distinct 
literatures ori the effectiveness of coaching programs The 
first is on the SAT and reports small effects from coaching. 
The second covers other aptitude tests and shows that 
coaching programs can have substantial effects; 

The small SAT ef f ects should not come as a surprise ; 
Reviewers of the SAT coaching literature have repeatedly 
stated that the typical effect from SAT coaching is to raise 
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scores by between .1 and .2 standard-deviation units; Our 
results are consistent with these other findings, arid yet we 
do not believe that the SAT is coach-proof. At least one 
well-designed study—by Evans and Pike ( 1 973 ) --reported 
substantial coaching effects on the SAT. This study was. 
carried out by ETS researchers who were thoroughly familiar 
with the SAT item pool and who developed special coaching 
materials for specific . SAT item types. Other coachers have 
not been as familiar with SAT items because ETS security 
policies have until recently put SAT test forms out of their 
reach; Recent changes in ETS policies give the public much 
more access. to SAT items and test forms, and it is possible 
that we will in the future see greater success for SAT 
coaching programs . 

Some reviewers have speculated that program duration 

can explain much of the variation in the outcomes of studies 
of SAT effectiveness. Effective coaching programs for the 
SAT, they say, are long in duration while ineffective 
programs are short . Messick and Jungeblut M 98 1 ) , in fact, 
have presented regression equations relating the logarithm 
of program length to the gain attributable to coaching; 
Using a pool of studies that differed slightly from Messick 
and Jungeblut^s pool, we were unable to replicate their 
result. We do not believe therefore that the cor relat ion 
that Messick and Jungeblut found between program duration 
and SAT effects is a robust one . 

Our f i ndings on coaching for other apt itude tests were 
similar to f indings presented by Vernon in 1 954 . According 
to Vernon, practice and coaching can raise aptitude scores 
by about . 6 standard deviations (or 8 to 9 points on an IQ 
scale ) . We found that the average combined effect of 
practice and coaching to be . 76 standard deviations and the 
average effect attributable to coaching alone to be .4 
standard deviations . 

Studies that used a pretest yielded larger estimates of 
pure coaching effects than did other studies. In studies 
with £ pretest, effects attributable to coaching averaged 
.51 standard deviations . In studies without pretests , 
effects of coaching averaged .27 standard deviations . It 
seems possible that the pretest acted to sensitize _ the 
students to the information presented in the coaching 
program. I f so, a pretest may be an important component in 
any program designed to prepare students for aptitude tests. 

We were not able to f ind other factors that influenced 
study results . Although this failure was disappointing , it 
was not unexpected. After examining results from numerous 
meta-analyses , Glass, McGaw , and Smith ( 1 98 1 ) concluded^ 
reluctantly that the f indings of contemporary research in 
the social sciences_of ten fit together poorly, and that 
variation in study findings is only modectly predictable 



from study characteristics. The results of our meta- 
analysis support. this conclusion. Even with. the use of 
objective tools for synthesis of findings, it was impossible 
to explain fully why coaching results differ as much as they 
do from study to study. 
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Table 1 
Program and Test Features 



Level of coaching 

Duration of program in hours 

Commercial vs. school program 

Components of coaching program 
— Testwiseness training 
— Drill and practice 
—Content teaching 

Target test for coaching program 
--Group vs i individual test 
--Full test vs. subtest _ 
--Teacher-made vs . standardized test 
--SAT vs. other test 
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Table 2 
Study Features 

True vs. quasi -experiment 
Pretest vs . post test only design 
Laboratory vs ; field study 
ETS-sponsored vs. other research 
New vs. f ie Id- tested program 
Grade level of students 
Ability level of students 
Source of report 
Year of report 
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Figure Captions 

Figure 1. Distribution of coaching effects for 38 
studies . 

Figure 2. Distribution of coaching effects for 14 SAT 
studies and 24 studies of other aptitude tests . 



ERLC 



12 



9 




EFFECT SIZE 



SRT STUDIES 



□□□on OTHER STUDIES 



15 



U 



